AFRL-AFOSR-JP-TR-2016-0014 Bio-Inspired Human-Level Machine Learning

نویسنده

  • Byoung-Tak Zhang
چکیده

How can brain computation be so fast, flexible, and robust? What kinds of representational and organizational principles facilitate the biological brain to learn so efficiently and flexibly on the sub-second time scale and so reliably on the continuous lifetime scale? To understand these principles, we aimed to develop human-level machine learning technology that is fast, flexible, and reliable to adapt to a continuously changing, dynamic environment. Based on dynamic “neural” populations (neural assemblies), we constructed a “human-like” machine learning model and implement this model in “molecular” populations (molecular assemblies) using in vitro DNA computing. In the first year, we developed the dynamic hypernetwork models of neural populations in the sequential Bayesian framework for lifelong learning. In the second year, we extended it to the molecular dynamic hypernetwork model, and designed in vitro experimental protocols to implement online language learning from a stream of text corpus. In the third year, we demonstrated the use of molecular dynamic hypernetworks for multimodal visuo-linguistic concept learning from a long stream of video data and their extensions to high-level cognitive functions such as anagram solving problem. We expect that the bio-inspired human-level machine learning combined with molecular-computing implementation can offer an interesting, novel paradigm to address for flexible and reliable computing. Introduction: One of the main challenges in artificial intelligence is to develop human-like machine learning technology that is fast, flexible, and reliable to adapt to a continuously changing, dynamic environment. Converging neuroanatomical and neurophysiological evidence shows that the brain uses distributed, overlapping representations based on sparse population codes that are coordinated dynamically (Averbeck et al., 2006; Pouget et al., 2000; von der Malsburg et al., 2010). We hypothesize that brain computation exploits the huge degrees of freedom generated by a large number of memory units, ranging from neurotransmitters and neurons to DISTRIBUTION A: Distribution approved for public release. cell-assembly, and organized into multiscale complex networks in space and coordinated dynamically in time (Caroni, 2012; Freeman, 2000). The objective of this project is to build a learning-friendly computational model based on dynamic neural populations and implementing this model in self-assembling molecular populations using DNA computing. A key idea underlying this approach is that the plasticity of neural populations in the brain is based on molecular interactions at the physico-chemical level and, thus, molecular computational processes can naturally simulate human-like learning and memory. The molecular self-assembly mechanisms in DNA chemistry provide us a natural, physical medium for modeling dynamic “neural” populations (neural assemblies). Massively parallel mechanisms of in vitro DNA computing provide us a convenient tool for dealing with large populations, 10 molecules in a nano-mole, which is bigger than the numbers of 10 neurons and 10 synaptic connections in the human brain. In previous work, we experimentally demonstrated the feasibility of cognitive memory with DNA self-assembly. We showed that wet DNA computing can implement weighted-sum operations which are fundamental to perform pattern classification (Lim et al., 2010). Since pattern classification underlies many cognitive tasks, this work opened a new way of creating flexible cognitive memories in vitro with molecules. We also demonstrated the potential of the molecular self-assembly model to build associative language models automatically from language data to generate sentences (Lee et al., 2011). On the mathematical and computational modeling side we developed a probabilistic graphical model of sparse, random population codes called hypernetworks (Zhang, 2008). The model also applied to a visually-grounded language learning (Zhang 35 al., 2012), where cognitive memory consists of multimodal compound concepts which are encoded as hyperedges (molecular memory particles) and then assembled, dissembled, and reassembled to be adapted incrementally as the video sequences are observed. However, there were several challenges to achieving human-level learning and memory. First, the concept of population coding needed to be extended to deal with online, predictive learning in a changing environment. Second, representational formalisms and their translations between neural populations and molecular populations needed to be investigated. Third, the DNA computing and molecular learning technology needed be scaled up to make molecular computational simulation of the whole-brain scale, to make cognitive learning possible and to achieve human-level machine learning. In the first year of the project, we focused on constructing mathematical theories of dynamic neural populations. Building upon our previous work on the hypernetwork models of cognitive learning and memory (Zhang, 2012), we developed population-coded dynamic hypernetwork models of lifelong learning in a non-stationary, changing environment [1, 2, 6, 8, 9, 17]. In [9], we discussed our model from the perspectives of embodied cognition, multisensory integration, cognitive dynamics, perception-action cycle, and lifelong learning. We developed a sequential Bayesian framework for lifelong learning, built a taxonomy of lifelong-learning paradigms, and examined information-theoretic objective functions for each paradigm, with an emphasis on active learning. Also, in [7], we presented that DNA hybridization can be modeled as computing the inner product between DISTRIBUTION A: Distribution approved for public release. embedded vectors in a corresponding vector space, and proposed the algorithm performing learning of a binary classifier in this vector space. In the second year, we extended this to the molecular dynamic hypernetwork model, and designed in vitro experimental protocols to implement online language learning from a stream of text corpus [3, 4, 10, 14, 19, 20, 23]. To measure the difference between different information-encoded sequences, we introduced the symmetric internal loops of double stranded DNA, and which were used to recognize similar or different patterns. Through a series of training processes which is simply storing the given training data in different microtubes in each class of hypernetwork, we observed that the accuracy of sentence classification tasks increased on the corpus of TV show dialogue and our molecular learning was able to generalize the training sentences. In the third year, we demonstrated the use of molecular dynamic hypernetworks for multimodal visuo-linguistic concept learning from a long stream of video data. Motivated by the cognitive developmental process of children constructing the visually grounded concepts from multimodal stimuli (Meltzoff, 1990), we proposed a hierarchical model of automatically constructing visual-linguistic knowledge by dynamically learning concepts represented with vision and language from videos [8, 12, 15, 16, 22]. We developed a stochastic method for graph construction, i.e. a graph Monte Carlo algorithm, and our model learns the concepts by the algorithm while observing new videos, thus robustly tracing concept drift and continuously accumulating new conceptual knowledge. Using a series of approximately 200 episodes of educational cartoon videos we examined the emergence and evolution of the concept hierarchies as the video stories unfold. Through the experiment, we observed that the number of visual and linguistic nodes tends to increase, because the concepts continuously develop while observing the videos. Also, we presented a molecular computational model for human anagram solving to show the potential of application to high-level cognitive functions [5, 11, 13, 18, 21]. Our major contribution is to propose the molecular assembly model of cognitive memory and learning which can be used as a tool for simulating cognitive dynamics involved with multisensory cue integration, grounded concept learning, and interaction of vision and language. We believe that the bio-inspired human-level machine learning combined with molecular-computing implementation can offer an interesting, novel paradigm to address for flexible and reliable computing. We also expect that the cognitive memory architectures and their learning algorithms contribute to revolutionize the AI technology to be used in lifelong learning, self-organizing, sensorimotor systems. DISTRIBUTION A: Distribution approved for public release. [1 Year] The Dynamic Hypernetwork Models of Neural Populations Experiments: In the first year, we constructed a dynamic Bayesian inference framework and examined information-theoretic objective functions for lifelong learning [9]. In lifelong learning, training data are observed sequentially as learning unfolds and not kept for iterative reuse. The learning is proceeded in an online and incremental manner over an extended period in a changing environment. This requires incremental transfer of knowledge acquired from previous learning to future learning, which can be formulated as a Bayesian inference. We applied a sequential Bayesian framework for lifelong learning to build taxonomy of lifelong-learning paradigms, and examine information-theoretic objective functions for each paradigm (Figure 1). Figure 1. Lifelong learning with action-perception-learning cycle [9] Results and Discussion: We distinguished three paradigms of lifelong-learning: learning with passive and continual observations, learning with actions (but without reward feedbacks), and active learning with explicit rewards. For each of the paradigm we examined the objective functions of the lifelong learning styles: prediction errors and predictive information, empowerment which measures how much influence an agent has on its environment, and the value function or the expected reward of the agent. We believe the general framework and the objective functions for lifelong learning can provide a baseline for evaluating the representations and strategies of the learning algorithms. Specifically, the objective functions can be used for innovating algorithms for discovery, revision, and transfer of knowledge of the lifelong learners over the extended period of experience. Our emphasis on information theory-based active and predictive learning with minimal mechanistic assumptions on model structures can be especially fruitful for automated knowledge acquisition and sequential knowledge transfer between a wide range of similar but significantly different tasks and domains. For a theoretical study, we also presented a computational learning method for bio-molecular classification [7]. In this study, we showed how to design biochemical operations both for learning and pattern classification. DNA hybridization is modeled as computing the inner product between embedded vectors in a corresponding vector space (Figure 2), and our algorithm performed learning of a binary classifier in this vector space. Our algorithm manipulates populations of DNA sequences via hybridization and denaturing operations, modifying distributions of the associated DISTRIBUTION A: Distribution approved for public release. vectors in the kernel feature space. After learning is performed on data examples, an unknown DNA sequence molecule can be directly classified using the learned weights in the molecular population. We analyzed the thermodynamic behavior of these learning algorithms, and showed simulations on artificial and real datasets as well as demonstrate preliminary wet experimental results using gel electrophoresis. In our classification results with the generated data shown in Figure 3, points in a two-dimensional space are labeled into two classes shown in the yellow and blue color. In this space, the binding energy is given by the Euclidean distance between pairs of points. The contours represent various hybridization amounts, and change according to the annealing temperature schedules. This shows how controlling the hybridization schedule influences both the positive definiteness and sparsity of the resulting kernel matrices. With sufficient annealing as shown in Figure 3(a), the kernel satisfies positive definiteness. In Figure 3(c) with no annealing, the kernel does not satisfy positive definiteness, resulting in bad classification results. With high temperature hybridization in Figure 3(b), the kernel matrix is positive definite but very diagonally dominant and sparse. In this case, the hybridization contours show that the decision surface depends more specifically on nearest neighbors as compared to the decision surface in Figure 3(a). Such a sparse kernel matrix would be more vulnerable to noise in the training data. The ROC curves showed that the classification performance of our proposed method is superior to kFDA and performs better than the SVM algorithm (Figure 4). Figure 2. DNA sequence mapped into a vector space by an inner product [7] Figure 3. Classification of two-class data learned with different temperature schedules [7] (a) 80C to 20C, (b) 80C constant, and (c) 30C constant DISTRIBUTION A: Distribution approved for public release. Figure 4. The classification results performed using DNA learning, SVMs and kernel FDA using the same DNA kernel [7] [2 year] DNA-Computing Implementations of the Dynamic Hypernetwork Models Experiments: In the second year, we developed a molecular machine learning model in vitro using symmetric internal loops of double stranded DNA [23]. To enable the molecules to learn, the way of measuring differences between sequences was needed. By using mismatching DNA sequences during hybridization, we encoded information into DNA sequences (Figure 5) and designed the DNA sequences to produce the symmetric internal loops when matched with the different sequences of same size. (Figure 6) These mismatches were used to determine the distances between given instances, which is essential for recognizing similar or different patterns. Figure 5. Encoding sentences in DNA sequences in the structure of hyperedge DISTRIBUTION A: Distribution approved for public release. Figure 6. Symmetric internal loops of double stranded DNA in sentences [23(submitted)] The training process involves simply storing the given training data in different microtubes in each class of hypernetwork. When a new training data is encountered, similar and identical instances are retrieved from the hypernetworks and used to classify the new example. The classification of the data is conducted through gel electrophoresis by comparing relative intensity of the bands (Figure 7). The intensity of the band represented the probability of that test data to be classified into that class, i.e. a higher band intensity meant higher probability that the sentence belonged to the according class. Figure 7. Classification after training [23(submitted)] Figure 8. The proposed molecular machine learning process [23(submitted)] To evaluate the model, DNA molecules were trained using a set of sentences obtained from a corpus of TV drama dialogue and tested using a set of unknown sentences from same corpus (Figure 8). We collected sentences of the TV drama videos of ‘Friends’ and ‘Prison Break’ for learning and testing the DNA hypernetwork models. We designed the DNA sequences for implementing the DISTRIBUTION A: Distribution approved for public release. molecular hypernetwork model of language that can distinguish the sentences whether they come from Friends or Prison Break. A 20-sentence classification experiment has been conducted to evaluate the feasibility of the population-coding based molecular learning of language concepts. 10 of the 20 sentences (5 from Friends, 5 from Prison Break) were used to train, and the other 10 sentences (5 from Friends, 5 from Prison Break) were used to test. Results and Discussion: The result of our experiments showed that the molecular learning machine was able to generalize training sentences (Figure 9). We summed up the correctly classified examples in each classification test and presented these results in bar graphs (Figure 10). The hypernetwork was gradually trained and tested at each step. Regardless of the low accuracy in the initial training phase, the accuracy was increased to 100% at the end of the training process in each case. Figure 9. Verification of training steps by classification of (A) Friends and (B) Prison Break training data [23(submitted)] Figure 10. Accuracy of the classification of test and training examples in each training step [23(submitted)] The major contribution of this work is the implementation of machine learning algorithm in vitro exploiting the symmetric internal loops. We verified each molecular learning step and performed classification experiments using the test data, which enabled to present the generalization phenomenon. By exploiting the generality of machine learning, our novel molecular learning machine could in principle be used to solve other problems such as text mining and molecular recognition in biology if the data can be properly encoded in DNA molecules. DISTRIBUTION A: Distribution approved for public release. [3 Year-1] Molecular Dynamic Hypernetworks for Multimodal Concept Learning Experiments: In the third year, we applied the molecular dynamic hypernetwork models to learning multimodal vision-language concepts from videos. The resulting model is called deep concept hierarchy (DCH) [16] and consists of two or more concept layers and one layer of multiple modalities (Figure 11). Each concept layer is represented by a hypergraph structure, and this structure enables the multiple levels of concepts to be represented by the probability distribution of the visual-textual variables (Figure 12). The higher concept layers represent more abstract concepts than the lower layers, and the modality layer contains the populations of many microcodes encoding the higher-order relationships among two or more visual and textual variables. Each concept is represented as the probability distribution of word-patch appearance. Figure 11. Architecture of deep concept hierarchy [16] Figure 12. Example of deep concept hierarchy learned from Pororo videos [16] To efficiently search the huge space of the vision-language concepts, we developed a stochastic method for graph construction, i.e. a graph Monte Carlo (graph MC) algorithm. DCH incrementally learns the concepts by the graph MC and the weight update process while observing new videos, thus robustly tracing concept drift and continuously accumulating new conceptual knowledge, allowing for being deployed in lifelong learning environments. To verify our proposed model, the experiments conducted using the collection of the cartoon video “Pororo” consisting of 183 episodes with 1,232 minutes of playing time. As training and test data, 16,000 picture-sentence pairs were prepared, and we ran cognitive developmental experiments using population-coded hypernetwork e e e e e e e e e e e e e e w w w w w w w r r r r r Abstract Multiple Conceptual Layer

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016